A Hashed Schema for Similarity Search in Metric Spaces (invited talk)

نویسنده

  • Pavel Zezula
چکیده

A novel access structure for similarity search in metric data, called Similarity Hashing (SH), is proposed. Its multi-level hash structure of separable buckets on each level supports easy insertion and bounded search costs, because at most one bucket needs to be accessed at each level for range queries up to a pre-de ned value of search radius. At the same time, the number of distance computations is always signi cantly reduced by use of pre-computed distances obtained at insertion time. Buckets of static les can be arranged in such a way that the I/O costs never exceed the costs to scan a compressed sequential le. Experimental results demonstrate that the performance of SH is superior to the available tree-based structures. Contrary to tree organizations, the SH structure is suitable for distributed and parallel implementations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

A Content-Addressable Network for Similarity Search in Metric Spaces

Because of the ongoing digital data explosion, more advanced search paradigms than the traditional exact match are needed for contentbased retrieval in huge and ever growing collections of data produced in application areas such as multimedia, molecular biology, marketing, computer-aided design and purchasing assistance. As the variety of data types is fast going towards creating a database uti...

متن کامل

Access Structures for Advanced Similarity Search in Metric Spaces

Similarity retrieval is an important paradigm for searching in environments where exact match has little meaning. Moreover, in order to enlarge the set of data types for which the similarity search can efficiently be performed, the notion of mathematical metric space provides a useful abstraction for similarity. In this paper we consider the problem of organizing and searching large data-sets f...

متن کامل

Fast Information-Theoretic Agglomerative Co-clustering

Our algorithm iteratively merges those clusters whose merge yields a lower objective cost. However, operations such as finding nearest neighbors or closest pair of clusters are expensive, especially in high dimensions. To quickly find highly similar clusters to be merged, we exploit the Locality-Sensitive Hashing (LSH) technique, which we briefly describe in this section. Simply put, LSH [2] is...

متن کامل

New Approaches to Similarity Searching in Metric Spaces

Title of dissertation: NEW APPROACHES TO SIMILARITY SEARCHING IN METRIC SPACES Cengiz Celik, Doctor of Philosophy, 2006 Dissertation directed by: Professor David Mount Department of Computer Science The complex and unstructured nature of many types of data, such as multimedia objects, text documents, protein sequences, requires the use of similarity search techniques for retrieval of informatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000